Hybrid compiler/hardware prefetching for multiprocessors using low-overhead cache miss traps

نویسندگان

  • Jonas Skeppstedt
  • Michel Dubois
چکیده

We propose and evaluate a new data prefetching technique for cache coherent multiprocessors. Prefetches are issued by a prefetch engine which is controlled by the compiler. Second-level cache misses generate cache miss traps, and start the prefetch engine in a trap handler generated by the compiler. The only instruction overhead in our approach is when a trap handler terminates after data arrives. We present the functionality of the prefetch engine and a compiler algorithm to control it. We also study emulation of the prefetch engine in software. Our techniques are evaluated on six parallel applications using a compiler which incorporates our algorithm and a simulated multiprocessor. The prefetch engines remove up to 67% of the memory access stall time at an instruction overhead less than 0.42%. The emulated prefetch engines remove in general less stall time at a higher instruction overhead.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Preliminary Evaluation of Cache-miss-initiated Prefetching Techniques in Scalable Multiprocessors

Prefetching is an important technique for reducing the average latency of memory accesses in scalable cache-coherent multiprocessors. Aggressive prefetching can signiicantly reduce the number of cache misses, but may introduce bursty network and memory traac, and increase data sharing and cache pollution. Given that we anticipate enormous increases in both network bandwidth and latency, we exam...

متن کامل

Sequential Hardware Prefetching in Shared-Memory Multiprocessors

To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several softwareand hardware-based data prefetching schemes have been proposed. A major advantage of hardware techniques is that they need no support from the programmer or compiler. Sequential prefetching is a simple hardware-controlled prefetching technique which relies on the automatic pref...

متن کامل

Integrating Fine-Grained Message Passing in Cache Coherent Shared Memory Multiprocessors

This paper considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency caused by interprocessor communication in cache coherent, shared memory multiprocessors. Data prefetching is accomplished by using a multiprocessor software pipelined algorithm. Data forwarding is used to target interprocessor data communication, rather than synchronizatio...

متن کامل

Compiler Support for Dynamic Speculative Pre-Execution

Speculative pre-execution is a promising prefetching technique which uses an auxiliary assisting thread in addition to the main program flow. A prefetching thread (p-thread), which contains the future probable cache miss instructions and backward slice, can run on the spare hardware context for data prefetching. Recently, various forms of speculative pre-execution have been developed, including...

متن کامل

Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors

While prefetch has proven itself useful for reducing cache misses in multiprocessors, traffic is often increased due to extra unused prefetch data. Prefetching in multiprocessors can also increase the cache miss rate due to the false sharing caused by the larger pieces of data retrieved. The capacity prefetching strategy proposed in this paper is built on the assumption that prefetching is most...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997